AITopics | db 0

Collaborating Authors

db 0

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Optimizing Learned Image Compression on Scalar and Entropy-Constraint Quantization

Borzechowski, Florian, Schäfer, Michael, Schwarz, Heiko, Pfaff, Jonathan, Marpe, Detlev, Wiegand, Thomas

arXiv.org Artificial IntelligenceJun-11-2025

The continuous improvements on image compression with variational autoencoders have lead to learned codecs competitive with conventional approaches in terms of rate-distortion efficiency. Nonetheless, taking the quantization into account during the training process remains a problem, since it produces zero derivatives almost everywhere and needs to be replaced with a differentiable approximation which allows end-to-end optimization. Though there are different methods for approximating the quantization, none of them model the quantization noise correctly and thus, result in suboptimal networks. Hence, we propose an additional finetuning training step: After conventional end-to-end training, parts of the network are retrained on quantized latents obtained at the inference stage. For entropy-constraint quantizers like Trellis-Coded Quantization, the impact of the quantizer is particularly difficult to approximate by rounding or adding noise as the quantized latents are interdependently chosen through a trellis search based on both the entropy model and a distortion measure. We show that retraining on correctly quantized data consistently yields additional coding gain for both uniform scalar and especially for entropy-constraint quantization, without increasing inference complexity. For the Kodak test set, we obtain average savings between 1% and 2%, and for the TecNick test set up to 2.2% in terms of Bjøntegaard-Delta bitrate.

artificial intelligence, machine learning, quantization, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICIP51287.2024.10648254

2506.08662

Country: Europe > Germany (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Beam's Eye View to Fluence Maps 3D Network for Ultra Fast VMAT Radiotherapy Planning

Arberet, Simon, Ghesu, Florin C., Gao, Riqiang, Kraus, Martin, Sackett, Jonathan, Kuusela, Esa, Kamen, Ali

arXiv.org Artificial IntelligenceFeb-5-2025

Volumetric Modulated Arc Therapy (VMAT) revolutionizes cancer treatment by precisely delivering radiation while sparing healthy tissues. Fluence maps generation, crucial in VMAT planning, traditionally involves complex and iterative, and thus time consuming processes. These fluence maps are subsequently leveraged for leaf-sequence. The deep-learning approach presented in this article aims to expedite this by directly predicting fluence maps from patient data. We developed a 3D network which we trained in a supervised way using a combination of L1 and L2 losses, and RT plans generated by Eclipse and from the REQUITE dataset, taking the RT dose map as input and the fluence maps computed from the corresponding RT plans as target. Our network predicts jointly the 180 fluence maps corresponding to the 180 control points (CP) of single arc VMAT plans. In order to help the network, we pre-process the input dose by computing the projections of the 3D dose map to the beam's eye view (BEV) of the 180 CPs, in the same coordinate system as the fluence maps. We generated over 2000 VMAT plans using Eclipse to scale up the dataset size. Additionally, we evaluated various network architectures and analyzed the impact of increasing the dataset size. We are measuring the performance in the 2D fluence maps domain using image metrics (PSNR, SSIM), as well as in the 3D dose domain using the dose-volume histogram (DVH) on a validation dataset. The network inference, which does not include the data loading and processing, is less than 20ms. Using our proposed 3D network architecture as well as increasing the dataset size using Eclipse improved the fluence map reconstruction performance by approximately 8 dB in PSNR compared to a U-Net architecture trained on the original REQUITE dataset. The resulting DVHs are very close to the one of the input target dose.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2502.0336

Country: Europe > Germany (0.14)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Fast, Scalable, and Robust Deep Learning-based Iterative Reconstruction Framework for Accelerated Industrial Cone-beam X-ray Computed Tomography

Pramanik, Aniket, Rahman, Obaidullah, Venkatakrishnan, Singanallur V., Ziabari, Amirkoushyar

arXiv.org Artificial IntelligenceJan-21-2025

Cone-beam X-ray Computed Tomography (XCT) with large detectors and corresponding large-scale 3D reconstruction plays a pivotal role in micron-scale characterization of materials and parts across various industries. In this work, we present a novel deep neural network-based iterative algorithm that integrates an artifact reduction-trained CNN as a prior model with automated regularization parameter selection, tailored for large-scale industrial cone-beam XCT data. Our method achieves high-quality 3D reconstructions even for extremely dense thick metal parts - which traditionally pose challenges to industrial CT images - in just a few iterations. Furthermore, we show the generalizability of our approach to out-of-distribution scans obtained under diverse scanning conditions. Our method effectively handles significant noise and streak artifacts, surpassing state-of-the-art supervised learning methods trained on the same data.

artificial intelligence, machine learning, reconstruction, (17 more...)

arXiv.org Artificial Intelligence

2501.13961

Country:

North America > United States > Tennessee > Anderson County > Oak Ridge (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (0.82)

Industry:

Energy (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.93)
Government > Regional Government > North America Government > United States Government (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Rethinking Diffusion-Based Image Generators for Fundus Fluorescein Angiography Synthesis on Limited Data

Yu, Chengzhou, Fang, Huihui, Wang, Hongqiu, Deng, Ting, Du, Qing, Xu, Yanwu, Yang, Weihua

arXiv.org Artificial IntelligenceDec-17-2024

Fundus imaging is a critical tool in ophthalmology, with different imaging modalities offering unique advantages. For instance, fundus fluorescein angiography (FFA) can accurately identify eye diseases. However, traditional invasive FFA involves the injection of sodium fluorescein, which can cause discomfort and risks. Generating corresponding FFA images from non-invasive fundus images holds significant practical value but also presents challenges. First, limited datasets constrain the performance and effectiveness of models. Second, previous studies have primarily focused on generating FFA for single diseases or single modalities, often resulting in poor performance for patients with various ophthalmic conditions. To address these issues, we propose a novel latent diffusion model-based framework, Diffusion, which introduces a fine-tuning protocol to overcome the challenge of limited medical data and unleash the generative capabilities of diffusion models. Furthermore, we designed a new approach to tackle the challenges of generating across different modalities and disease types. On limited datasets, our framework achieves state-of-the-art results compared to existing methods, offering significant potential to enhance ophthalmic diagnostics and patient care. Our code will be released soon to support further research in this field.

artificial intelligence, db 0, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2412.12778

Country:

Asia > China > Guangdong Province > Guangzhou (0.05)
Asia > China > Guangdong Province > Shenzhen (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

radarODE-MTL: A Multi-Task Learning Framework with Eccentric Gradient Alignment for Robust Radar-Based ECG Reconstruction

Zhang, Yuanyuan, Yang, Rui, Yue, Yutao, Lim, Eng Gee

arXiv.org Artificial IntelligenceOct-11-2024

Millimeter-wave radar is promising to provide robust and accurate vital sign monitoring in an unobtrusive manner. However, the radar signal might be distorted in propagation by ambient noise or random body movement, ruining the subtle cardiac activities and destroying the vital sign recovery. In particular, the recovery of electrocardiogram (ECG) signal heavily relies on the deep-learning model and is sensitive to noise. Therefore, this work creatively deconstructs the radar-based ECG recovery into three individual tasks and proposes a multi-task learning (MTL) framework, radarODE-MTL, to increase the robustness against consistent and abrupt noises. In addition, to alleviate the potential conflicts in optimizing individual tasks, a novel multi-task optimization strategy, eccentric gradient alignment (EGA), is proposed to dynamically trim the task-specific gradients based on task difficulties in orthogonal space. The proposed radarODE-MTL with EGA is evaluated on the public dataset with prominent improvements in accuracy, and the performance remains consistent under noises. The experimental results indicate that radarODE-MTL could reconstruct accurate ECG signals robustly from radar signals and imply the application prospect in real-life situations. The code is available at: http://github.com/ZYY0844/radarODE-MTL.

ecg recovery, noise, recovery, (16 more...)

arXiv.org Artificial Intelligence

2410.08656

Country:

Europe > United Kingdom (0.14)
Asia > China > Shaanxi Province > Xi'an (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Image Restoration Using Deep Regulated Convolutional Networks

Liu, Peng, Zhou, Xiaoxiao, Li, Yangjunyi, D, El Basha Mohammad, Fang, Ruogu

arXiv.org Artificial IntelligenceJun-21-2024

While the depth of convolutional neural networks has attracted substantial attention in the deep learning research, the width of these networks has recently received greater interest. The width of networks, defined as the size of the receptive fields and the density of the channels, has demonstrated crucial importance in low-level vision tasks such as image denoising and restoration. However, the limited generalization ability, due to the increased width of networks, creates a bottleneck in designing wider networks. In this paper, we propose the Deep Regulated Convolutional Network (RC-Net), a deep network composed of regulated sub-network blocks cascaded by skip-connections, to overcome this bottleneck. Specifically, the Regulated Convolution block (RC-block), featured by a combination of large and small convolution filters, balances the effectiveness of prominent feature extraction and the generalization ability of the network. RC-Nets have several compelling advantages: they embrace diversified features through large-small filter combinations, alleviate the hazy boundary and blurred details in image denoising and super-resolution problems, and stabilize the learning process. Our proposed RC-Nets outperform state-of-the-art approaches with significant performance gains in various image restoration tasks while demonstrating promising generalization ability. The code is available at https://github.com/cswin/RC-Nets.

db 0, feature extraction, rc-net, (15 more...)

arXiv.org Artificial Intelligence

1910.08853

Country:

North America > United States (0.05)
Asia > China (0.04)

Genre: Research Report > Promising Solution (0.35)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

FrFT based estimation of linear and nonlinear impairments using Vision Transformer

Jiang, Ting, Gao, Zheng, Chen, Yizhao, Hu, Zihe, Tang, Ming

arXiv.org Artificial IntelligenceAug-25-2023

To comprehensively assess optical fiber communication system conditions, it is essential to implement joint estimation of the following four critical impairments: nonlinear signal-to-noise ratio (SNRNL), optical signal-to-noise ratio (OSNR), chromatic dispersion (CD) and differential group delay (DGD). However, current studies only achieve identifying a limited number of impairments within a narrow range, due to limitations in network capabilities and lack of unified representation of impairments. To address these challenges, we adopt time-frequency signal processing based on fractional Fourier transform (FrFT) to achieve the unified representation of impairments, while employing a Transformer based neural networks (NN) to break through network performance limitations. To verify the effectiveness of the proposed estimation method, the numerical simulation is carried on a 5-channel polarization-division-multiplexed quadrature phase shift keying (PDM-QPSK) long haul optical transmission system with the symbol rate of 50 GBaud per channel, the mean absolute error (MAE) for SNRNL, OSNR, CD, and DGD estimation is 0.091 dB, 0.058 dB, 117 ps/nm, and 0.38 ps, and the monitoring window ranges from 0~20 dB, 10~30 dB, 0~51000 ps/nm, and 0~100 ps, respectively. Our proposed method achieves accurate estimation of linear and nonlinear impairments over a broad range, representing a significant advancement in the field of optical performance monitoring (OPM).

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2308.13575

Country: Asia > China > Hubei Province > Wuhan (0.04)

Genre: Research Report (0.64)

Industry: Energy > Power Industry (0.34)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Rethinking complex-valued deep neural networks for monaural speech enhancement

Wu, Haibin, Tan, Ke, Xu, Buye, Kumar, Anurag, Wong, Daniel

arXiv.org Artificial IntelligenceJan-11-2023

Despite multiple efforts made towards adopting complex-valued deep neural networks (DNNs), it remains an open question whether complex-valued DNNs are generally more effective than real-valued DNNs for monaural speech enhancement. This work is devoted to presenting a critical assessment by systematically examining complex-valued DNNs against their real-valued counterparts. Specifically, we investigate complex-valued DNN atomic units, including linear layers, convolutional layers, long short-term memory (LSTM), and gated linear units. By comparing complex- and real-valued versions of fundamental building blocks in the recently developed gated convolutional recurrent network (GCRN), we show how different mechanisms for basic blocks affect the performance. We also find that the use of complex-valued operations hinders the model capacity when the model size is small. In addition, we examine two recent complex-valued DNNs, i.e. deep complex convolutional recurrent network (DCCRN) and deep complex U-Net (DCUNET). Evaluation results show that both DNNs produce identical performance to their real-valued counterparts while requiring much more computation. Based on these comprehensive comparisons, we conclude that complex-valued DNNs do not provide a performance gain over their real-valued counterparts for monaural speech enhancement, and thus are less desirable due to their higher computational costs.

artificial intelligence, machine learning, speech enhancement, (18 more...)

arXiv.org Artificial Intelligence

2301.0432

Country:

Asia (0.14)
North America > United States (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

End-to-End Video-To-Speech Synthesis using Generative Adversarial Networks

Mira, Rodrigo, Vougioukas, Konstantinos, Ma, Pingchuan, Petridis, Stavros, Schuller, Björn W., Pantic, Maja

arXiv.org Artificial IntelligenceAug-15-2022

Video-to-speech is the process of reconstructing the audio speech from a video of a spoken utterance. Previous approaches to this task have relied on a two-step process where an intermediate representation is inferred from the video, and is then decoded into waveform audio using a vocoder or a waveform reconstruction algorithm. In this work, we propose a new end-to-end video-to-speech model based on Generative Adversarial Networks (GANs) which translates spoken video to waveform end-to-end without using any intermediate representation or separate waveform synthesis algorithm. Our model consists of an encoder-decoder architecture that receives raw video as input and generates speech, which is then fed to a waveform critic and a power critic. The use of an adversarial loss based on these two critics enables the direct synthesis of raw audio waveform and ensures its realism. In addition, the use of our three comparative losses helps establish direct correspondence between the generated audio and the input video. We show that this model is able to reconstruct speech with remarkable realism for constrained datasets such as GRID, and that it is the first end-to-end model to produce intelligible speech for LRW (Lip Reading in the Wild), featuring hundreds of speakers recorded entirely `in the wild'. We evaluate the generated samples in two different scenarios -- seen and unseen speakers -- using four objective metrics which measure the quality and intelligibility of artificial speech. We demonstrate that the proposed approach outperforms all previous works in most metrics on GRID and LRW.

spectrogram, speech, video, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TCYB.2022.3162495

2104.13332

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany (0.04)
Europe > Finland > Northern Ostrobothnia > Oulu (0.04)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Canonical Polyadic Decomposition and Deep Learning for Machine Fault Detection

Gaetan, Frusque, Gabriel, Michau, Olga, Fink

arXiv.org Machine LearningJul-20-2021

Acoustic monitoring for machine fault detection is a recent and expanding research path that has already provided promising results for industries. However, it is impossible to collect enough data to learn all types of faults from a machine. Thus, new algorithms, trained using data from healthy conditions only, were developed to perform unsupervised anomaly detection. A key issue in the development of these algorithms is the noise in the signals, as it impacts the anomaly detection performance. In this work, we propose a powerful data-driven and quasi non-parametric denoising strategy for spectral data based on a tensor decomposition: the Non-negative Canonical Polyadic (CP) decomposition. This method is particularly adapted for machine emitting stationary sound. We demonstrate in a case study, the Malfunctioning Industrial Machine Investigation and Inspection (MIMII) baseline, how the use of our denoising strategy leads to a sensible improvement of the unsupervised anomaly detection. Such approaches are capable to make sound-based monitoring of industrial processes more reliable.

dataset, decomposition, nncp, (15 more...)

arXiv.org Machine Learning

2107.09519

Country:

North America > United States (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.64)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback